We present a machine-learning framework to accurately characterize morphologies of Active Galactic Nucleus (AGN) host galaxies within $z<1$. We first use PSFGAN to decouple host galaxy light from the central point source, then we invoke the Galaxy Morphology Network (GaMorNet) to estimate whether the host galaxy is disk-dominated, bulge-dominated, or indeterminate. Using optical images from five bands of the HSC Wide Survey, we build models independently in three redshift bins: low $(0<z<0.25)$, medium $(0.25<z<0.5)$, and high $(0.5<z<1.0)$. By first training on a large number of simulated galaxies, then fine-tuning using far fewer classified real galaxies, our framework predicts the actual morphology for $\sim$ $60\%-70\%$ host galaxies from test sets, with a classification precision of $\sim$ $80\%-95\%$, depending on redshift bin. Specifically, our models achieve disk precision of $96\%/82\%/79\%$ and bulge precision of $90\%/90\%/80\%$ (for the 3 redshift bins), at thresholds corresponding to indeterminate fractions of $30\%/43\%/42\%$. The classification precision of our models has a noticeable dependency on host galaxy radius and magnitude. No strong dependency is observed on contrast ratio. Comparing classifications of real AGNs, our models agree well with traditional 2D fitting with GALFIT. The PSFGAN+GaMorNet framework does not depend on the choice of fitting functions or galaxy-related input parameters, runs orders of magnitude faster than GALFIT, and is easily generalizable via transfer learning, making it an ideal tool for studying AGN host galaxy morphology in forthcoming large imaging survey.
translated by 谷歌翻译
Blind image super-resolution (Blind-SR) aims to recover a high-resolution (HR) image from its corresponding low-resolution (LR) input image with unknown degradations. Most of the existing works design an explicit degradation estimator for each degradation to guide SR. However, it is infeasible to provide concrete labels of multiple degradation combinations (\eg, blur, noise, jpeg compression) to supervise the degradation estimator training. In addition, these special designs for certain degradation, such as blur, impedes the models from being generalized to handle different degradations. To this end, it is necessary to design an implicit degradation estimator that can extract discriminative degradation representation for all degradations without relying on the supervision of degradation ground-truth. In this paper, we propose a Knowledge Distillation based Blind-SR network (KDSR). It consists of a knowledge distillation based implicit degradation estimator network (KD-IDE) and an efficient SR network. To learn the KDSR model, we first train a teacher network: KD-IDE$_{T}$. It takes paired HR and LR patches as inputs and is optimized with the SR network jointly. Then, we further train a student network KD-IDE$_{S}$, which only takes LR images as input and learns to extract the same implicit degradation representation (IDR) as KD-IDE$_{T}$. In addition, to fully use extracted IDR, we design a simple, strong, and efficient IDR based dynamic convolution residual block (IDR-DCRB) to build an SR network. We conduct extensive experiments under classic and real-world degradation settings. The results show that KDSR achieves SOTA performance and can generalize to various degradation processes. The source codes and pre-trained models will be released.
translated by 谷歌翻译
Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models to non-English inputs and achieve impressive performance. However, these models focus only on understanding tasks utilizing encoder-only architecture. In this paper, we propose ERNIE-UniX2, a unified cross-lingual cross-modal pre-training framework for both generation and understanding tasks. ERNIE-UniX2 integrates multiple pre-training paradigms (e.g., contrastive learning and language modeling) based on encoder-decoder architecture and attempts to learn a better joint representation across languages and modalities. Furthermore, ERNIE-UniX2 can be seamlessly fine-tuned for varieties of generation and understanding downstream tasks. Pre-trained on both multilingual text-only and image-text datasets, ERNIE-UniX2 achieves SOTA results on various cross-lingual cross-modal generation and understanding tasks such as multimodal machine translation and multilingual visual question answering.
translated by 谷歌翻译
传统的LIDAR射测(LO)系统主要利用从经过的环境获得的几何信息来注册激光扫描并估算Lidar Ego-Motion,而在动态或非结构化环境中可能不可靠。本文提出了Inten-loam,一种低饮用和健壮的激光镜和映射方法,该方法完全利用激光扫描的隐式信息(即几何,强度和时间特征)。扫描点被投影到圆柱形图像上,这些图像有助于促进各种特征的有效和适应性提取,即地面,梁,立面和反射器。我们提出了一种新型基于强度的点登记算法,并将其纳入LIDAR的探光仪,从而使LO系统能够使用几何和强度特征点共同估计LIDAR EGO-MOTION。为了消除动态对象的干扰,我们提出了一种基于时间的动态对象删除方法,以在MAP更新之前过滤它们。此外,使用与时间相关的体素网格滤波器组织并缩减了本地地图,以维持当前扫描和静态局部图之间的相似性。在模拟和实际数据集上进行了广泛的实验。结果表明,所提出的方法在正常驾驶方案中实现了类似或更高的精度W.R.T,在非结构化环境中,最先进的方法优于基于几何的LO。
translated by 谷歌翻译
基于CNN的大多数超分辨率(SR)方法假设降解是已知的(\ eg,bicubic)。当降解与假设不同时,这些方法将遭受严重的性能下降。因此,一些方法试图通过多种降解的复杂组合来培训SR网络,以涵盖实际的降解空间。为了适应多个未知降解,引入显式降解估计器实际上可以促进SR性能。然而,以前的显式降解估计方法通常可以通过对地面模糊内核的监督来预测高斯的模糊,并且估计错误可能导致SR失败。因此,有必要设计一种可以提取隐式歧视性降解表示的方法。为此,我们提出了一个基于元学习的区域退化意识SR网络(MRDA),包括元学习网络(MLN),降级提取网络(DEN)和区域退化意识SR Network(RDAN)。为了处理缺乏地面污染的降解,我们使用MLN在几次迭代后快速适应特定的复合物降解并提取隐式降解信息。随后,教师网络MRDA $ _ {T} $旨在进一步利用MLN为SR提取的降解信息。但是,MLN需要在配对的低分辨率(LR)和相应的高分辨率(HR)图像上进行迭代,这在推理阶段不可用。因此,我们采用知识蒸馏(KD)来使学生网络学会直接提取与LR图像的老师相同的隐式退化表示(IDR)。
translated by 谷歌翻译
基于参考的超分辨率(REFSR)在使用外部参考(REF)图像产生现实纹理方面取得了重大进展。然而,现有的REFSR方法可以获得与输入大小一起消耗二次计算资源的高质量对应匹配,限制其应用程序。此外,这些方法通常遭受低分辨率(LR)图像和REF图像之间的比例错位。在本文中,我们提出了一种加速的多尺度聚合网络(AMSA),用于基于参考的超分辨率,包括粗略嵌入式斑块(CFE-PACKPMATCH)和多尺度动态聚合(MSDA)模块。为了提高匹配效率,我们设计一种具有随机样本传播的新型嵌入式PACKMTH方案,其涉及具有渐近线性计算成本的端到端训练到输入大小。为了进一步降低计算成本和加速会聚,我们在构成CFE-PACKMATCH的嵌入式PACKMACTH上应用了粗略策略。为了完全利用跨多个尺度的参考信息并增强稳定性的稳定性,我们开发由动态聚合和多尺度聚合组成的MSDA模块。动态聚合通过动态聚合特征来纠正轻微比例的错位,并且多尺度聚合通过融合多尺度信息来为大规模错位带来鲁棒性。实验结果表明,该拟议的AMSA对定量和定性评估的最先进方法实现了卓越的性能。
translated by 谷歌翻译
非本地注意力(NLA)通过利用自然图像中的内在特征相关性来带来单幅图像超分辨率(SISR)的显着改进。然而,NLA提供嘈杂的信息大量的权重,并且相对于输入大小消耗二次计算资源,限制其性能和应用。在本文中,我们提出了一种新的高效非局部对比度注意(Enca),以执行远程视觉建模并利用更相关的非局部特征。具体而言,Enca由两部分组成,有效的非本地注意力(Enla)和稀疏聚合。 ENLA采用内核方法来近似指数函数并获得线性计算复杂度。对于稀疏聚合,我们通过放大因子乘以专注于信息特征的输入,但近似的方差呈指数增加。因此,应用对比学习以进一步分离相关和无关的特征。为了展示Enca的有效性,我们通过在简单的骨干中添加一些模块来构建称为有效的非本地对比网络(ENLCN)的架构。广泛的实验结果表明,Enlcn对定量和定性评估的最先进方法达到了卓越的性能。
translated by 谷歌翻译
视觉问题应答(VQA)任务利用视觉图像和语言分析来回回答图像的文本问题。它是一个流行的研究课题,在过去十年中越来越多的现实应用。本文介绍了我们最近对AliceMind-MMU的研究(阿里巴巴的编码器 - 解码器来自Damo Academy - 多媒体理解的机器智能实验室),其比人类在VQA上获得相似甚至略微更好的结果。这是通过系统地改善VQA流水线来实现的,包括:(1)具有全面的视觉和文本特征表示的预培训; (2)与学习参加的有效跨模型互动; (3)一个新颖的知识挖掘框架,具有专门的专业专家模块,适用于复杂的VQA任务。处理不同类型的视觉问题,需要具有相应的专业知识在提高我们的VQA架构的表现方面发挥着重要作用,这取决于人力水平。进行了广泛的实验和分析,以证明新的研究工作的有效性。
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Humans have internal models of robots (like their physical capabilities), the world (like what will happen next), and their tasks (like a preferred goal). However, human internal models are not always perfect: for example, it is easy to underestimate a robot's inertia. Nevertheless, these models change and improve over time as humans gather more experience. Interestingly, robot actions influence what this experience is, and therefore influence how people's internal models change. In this work we take a step towards enabling robots to understand the influence they have, leverage it to better assist people, and help human models more quickly align with reality. Our key idea is to model the human's learning as a nonlinear dynamical system which evolves the human's internal model given new observations. We formulate a novel optimization problem to infer the human's learning dynamics from demonstrations that naturally exhibit human learning. We then formalize how robots can influence human learning by embedding the human's learning dynamics model into the robot planning problem. Although our formulations provide concrete problem statements, they are intractable to solve in full generality. We contribute an approximation that sacrifices the complexity of the human internal models we can represent, but enables robots to learn the nonlinear dynamics of these internal models. We evaluate our inference and planning methods in a suite of simulated environments and an in-person user study, where a 7DOF robotic arm teaches participants to be better teleoperators. While influencing human learning remains an open problem, our results demonstrate that this influence is possible and can be helpful in real human-robot interaction.
translated by 谷歌翻译